<Note: set up autoreferencing for figures and tables in your document>

Research Question and Rationale

Our research questions are:

What, if any, is the correlation between hydrologic flashiness and chemical flashiness in Florida?

What aspects of a stream influence chemical flashiness in Florida?

What aspects of a storm influence chemical flashiness in Florida?

Dataset Information

For this analysis, our datasets were USGS NWIS high frequency and water use data. The data contained in our datasets for hypothesis 1 are the site number, date/time of measurement, instantaneous discharge, and instantaneous nitrate. Contained in the datasets for hypothesis 2 and 3 are site number, date/time of measurement, instantaneous discharge, nitrate, pH, dissolved oxygen, specific conductance; population of the county each site is in, and the amount of surface water used for thermoelectric, industrial, livestock and irrigation uses in each county. We wrangled all our data frames to include periods of data that included all of the variables, and that exhibited the most continuous measurements - all for easier, more standardized analysis and visualization. We chose sites from the state of Florida because it had sufficient sites with high frequency nitrate and discharge data.

Exploratory Data Analysis and Wrangling

discharge_sites<-whatNWISsites(stateCd="FL", parameterCd="00060", hasDataTypeCd="iv")

DO_sites<-whatNWISsites(stateCd="FL", parameterCd="00300", hasDataTypeCd="iv")

conductance_sites<-whatNWISsites(stateCd= "FL", parameterCd="00095", hasDataTypeCd="iv")

pH_sites<-whatNWISsites(stateCd="FL", parameterCd="00400", hasDataTypeCd="iv")

nitrate_sites<-whatNWISsites(stateCd="FL", parameterCd="99133", hasDataTypeCd="iv")

hyp1_sites<-discharge_sites%>%
  filter(site_no %in% nitrate_sites$site_no)

hyp2_3_sites<-discharge_sites%>%
  filter(site_no %in% DO_sites$site_no)%>%
  filter(site_no %in% conductance_sites$site_no)%>%
  filter(site_no %in% pH_sites$site_no)%>%
  filter(site_no %in% nitrate_sites$site_no)

These lines of code show how we found our list of sites for hypothesis 1 and for hypothesis 2 and 3. We sought less variables with more sites for hypothesis 1 in order to run more robust tests. We sought more variables with less sites for hypothesis 2 and 3 to get a more complete look at a few different sites.

We used readNWISuv to read in the dataset from each site in the hyp1 and hyp2/3 dataframes.

We also retrieved NWIS Water Use data for each county and combined them with site data (Which county each site is in is retrieved and entered manually).

hyp1_sites = hyp1_sites[-c(1), ]
hyp1_sites_county = cbind(hyp1_sites,county_nm = c("Jefferson County","Madison County","Lafayette County","Levy County","Levy County","Gilchrist County", "Columbia County", "Columbia County","Citrus County","Citrus County","Citrus County","Marion County","Marion County", "Lee County", "Hendry County","Volusia County","Brevard County","Brevard County","Brevard County","Brevard County","Indian River County"))
wateruse <- readNWISuse(stateCd = "Florida",
                         countyCd = "ALL",
                         year = "2015",
                        categories = c("IT","LI","TP","IN","PO")) ## IT = irrigation, LI = livestock, TP = total population, IN = industrial, CO = commercial


wateruse_cleaned <- wateruse %>%
  select(county_nm,
         total_pop = Total.Population.total.population.of.area..in.thousands,
         industrial = Industrial.total.self.supplied.withdrawals..surface.water..in.Mgal.d,
         thermoelectric = Thermoelectric.Power..Once.through.cooling..total.self.supplied.withdrawals..surface.water..in.Mgal.,
         livestock = Livestock.self.supplied.surface.water.withdrawals..fresh..in.Mgal.d,
         irrigation = Irrigation..Total.total.self.supplied.withdrawals..surface.water..in.Mgal.d)

hyp2.data <- merge(hyp1_sites_county,wateruse_cleaned,by="county_nm")
hyp2.data = hyp2.data[order(hyp2.data$station_nm),]

Preliminary Visualization 1

Above is a preliminary look at the Discharge and Nitrate trends at one of our sites, Blue Springs.

Preliminary Visualization 2

Above is another preliminary, visual look at the trends of another site: Rainbow River near Dunnellon.

Preliminary Visualization 3

Above visual shows the distribution across Florida of the sites we will be conducting analyses on.

Analysis

Statistical Test 1: RBI-Chemostaticity Regression

We wrote a for loop to create a vector with the RBI value for each site. We plan to run a regression between RBI and Chemostatic coefficients to investigate the relationship between hydrologic flashiness and chemical flashiness. We currently don’t have RBI values because most of our sites don’t have catchment size values.

Statistical Test 2: Chemostaticity Regression

## 
## Call:
## lm(formula = Flow_DV ~ Nitrate_DV, data = BLUEChemo)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.162  -8.168   0.401  11.183  27.815 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 188.7212     2.7161   69.48   <2e-16 ***
## Nitrate_DV  -36.4499     0.9224  -39.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.5 on 11902 degrees of freedom
## Multiple R-squared:  0.116,  Adjusted R-squared:  0.1159 
## F-statistic:  1562 on 1 and 11902 DF,  p-value: < 2.2e-16
After loading the data for one of our sites, Blue Springs, we grouped the data by the Date/Time the measurement was taken and then took the mean Nitrate concentrations for each day. We did the same for discharge in the next line. We decided to use the daily mean values because we wanted to simplify the C-Q plot and take the regression line coefficient from this compacted dataframe.

Visualization 1: C-Q Plot

To the left is the accompanying C-Q plot and plotted regression line for the same site in Statistical Test 1. Both axes have been log-transformed to make the data points less clumped and more interpretable.

Visualization 2: Hysteresis Plot

Above is the hysteresis plot tracking discharge and nitrate concentrations and colored by dateTime during a 2015 Storm (or otherwise high discharge event) that we defined as taking place between 2015-08-28 and 2015-11-05. We see first counter-clockwise movement - as discharge increases and decreases due to initial overland flow, nitrate concentrations increase dramatically. Later, as more baseflow gets into the river system, the concentration of nitrogen decreases, but to a slightly higher level than measured at the start of the storm. This seems to be a flushing storm.

#Visualization 3: Chemostaticity and population Above visual shows how population and chemostaticity varies in different sites, where each bubble shows one site, the color of each bubble stands for the site’s chemostaticity and the size of each bubble stands for the population of the site’s county. It is unclear whether there is a relationship between population and chemostaticity so we decided to run an ANOVA test.

Statistical Test 3: ANOVA

##             Df  Sum Sq Mean Sq F value Pr(>F)
## total_pop    1   99299   99299   1.626  0.219
## Residuals   17 1038438   61085

According to the ANOVA, we don’t have sufficient evidence to reject the null hypothesis, meaning that there doesn’t seem to be an impact from the region’s population on a stream’s chemostaticity.

Summary and Conclusions

Unfortunately, we do not have any definite conclusions for any of our hypotheses at this point in our analysis. For hypothesis 1, we are still trying to find a way to calculate RBIs without catchment size data, but if we acquire that data we would be able to run a regression between that and the Chemostaticity regression coefficient. That could lead us to a definite conclusion regarding the nature of the relationship between Chemostaticity and Hydrologic Flashiness - that there is a correlation between the two, or that there is none.

As for hypothesis 2, we have some preliminary graphs trying to visualize possible relationships between water use type, population, and chemostaticity, but these results seem to show little to no correlation. It is unclear if there really is no correlation present, or if other factors are at play. More analysis is needed here.

In the meanwhile, we have made some purely qualitative observations about certain sites - specifically the ones that generated legible hysteresis plots. (We made the choice to exclude sites that had extremely chaotic discharge patterns, ones where it was impossible to pinpoint specific, major storm/discharge events).

Blue Springs is located in the least populated county among all our other sites. It seems to be a very rural area, though not much water is being appropriated to use for livestock/agriculture, so we concluded that this is not a heavily agricultural site. This location is in the middle of a state park. The storm we visualized for its hysteresis plot seems to be a flushing storm.

Caloosahatchee is located in the most populated county among all our other sites. Much of its water is being appropriated for thermoelectric power, and some for irrigation. The sensor for this location is located offshore a recreation park, but is otherwise amidst an urban area. A storm event here generated a hysteresis plot with a lot of noise, but still a general, flushing trend seemed to appear in the data.

Fanning Springs is located geographically in the middle of the previous two sites and the population of its surrounding county is similarly in the middle. There does not seem to be any major agricultural/industrial/hydroelectric water use in the county surrounding the site. The Springs does seem to be located in a populated/semi-urban area - next to a major road, etc. The storm event visualized for this location exhibited diluting behavior.

Madison Blue Spring is a site with similar conditions as Fanning Springs - located in a medium-sized county that does not seem to specialize in agriculture/industry/hydroelectric water-use-wise. The Spring itself is located in a state park, but on its periphery. It is near an airport. The hysteresis plot generated for a storm in this location is a little bit difficult to read, but it seems to be flushing.

From just these four sites, qualitatively, it seems like those located in or near parks exhibit flushing behavior, while the one site located in a solely urban area exhibits diluting behavior. This seems to be in line with what we know about flushing/diluting behavior - flushing generally correlates with overland flow over land that actually contain nutrients to carry to the stream, while diluting correlates with overland flow over impenetrable surfaces that might now have the same level of nutrients to carry off from.